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ABSTRACT 


Priority decisions arise whenever limited facilities must be apportioned 
among competitive demands for service. Broadly viewed, even the familiar 
first-come-first-served discipline is a priority rule. It favors the longest- 
waiting user, and guards against excessive delays. Other priority rules, 
such as shortest-job-next, are keyed instead to considerations of operating 
efficiency. Urgency of request is still another common consideration. Since 
these considerations often conflict, the priority rule serves as mediator. 

Use of a common cost measure can help effect this mediation, as results 


from recent job-shop simulations illustrate. 


A priority operation of contemporary interest is scheduling a time- 
shared computer among its concurrent users. Service requirements are not 
known in advance of execution. To keep response times short for small re- 
quests, service intervals are partitioned and segments are served separately 
in round-robin fashion. A mathematical analysis pinpoints the tradeoff be- 
tween overhead and discrimination implicit in this procedure, and allows 
alternate strategies to be costed. Extensions of the simple round-robin pro- 
cedure are suggested, the objectives of time sharing are reviewed, and 
implications are drawn for the design of future priority and pricing systems. 
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INTRODUCTION 


Granting preferred treatment is a basic part of man's way of life, demo- 
cratic principles notwithstanding. Whenever demand temporarily exceeds 
supply, or a limited facility must be rationed, or only one of several can be 
served at a time, someone accepted means someone else refused or deferred. 
Implicitly or explicitly, a question is being posed and resolved. In formal 
terms, we refer to the question as a priority problem and its solution as a 
priority decision. A strategy or objective plan for routinely making priority 
decisions, we call a priority rule. 


The priority problem arises in a variety of contexts. In a supermarket, 
customers with small orders are ushered to a special check-out counter. 
This expedites their service, and has the desirable secondary benefit of re- 
ducing congestion in the store, Giving priority to small requests is often a 


very good idea, but not always easy, as later discussions will illustrate. 


Importance or urgency is a second factor entering into priority deci- 
sions, At an airport with a single runway that is used jointly (but not simul- 
taneously) by arriving and departing aircraft, a landing plane may be given 
precedence over one waiting to take off, even though the departing plane was 
the first to request permission from the tower. This is especially true for 


an arriving jet whose tight budget of fuel is close to exhaustion. 


A priority rule comes closest to being democratic in the familiar first- 
come-first-served or first-in first-out dicipline. But even this policy is 
preferential, in the sense that it systematically favors the customer who has 
been waiting the longest. In doing so, it safeguards against excessive waits 


and controls the variance of waiting. 


CONFLICTING OBJECTIVES 


There are at least three different, and usually conflicting objectives 
that can influence a priority decision and affect the design of a priority 
rule: 


1. Reduce average response time and number waiting; thus 
increasing throughput or rate of output, 


2. Acknowledge customer importance and urgency of request, 
3. Serve in fair order and limit length of wait. 


For the best average performance, the shortest-service-time-next rule may 
be just right. But this rule allows a steady stream of short requests to 

delay a long request indefinitely. The mean wait is minimized at the ex- 
pense of the variance, and the special interests of the long user are sacri- 
ficed for the general welfare. To accord special interests their due, a 
balance must be struck among conflicting objectives. This requires a common 
measure of performance that incorporates the interests of the individual. 

Any contrived measure is going to be susceptible to challenge, but that is 


healthier than grumblings about seeming arbitrariness. 


COST RATE CURVES 


We choose an inverse measure of performance: the cost of delay. De- 
pending on the context, it may be thought of as a disutility, penalty cost, loss 
of goodwill, opportunity cost, postponement of revenue, customer dissatisfac- 
tion, storage cost, poorness of service, or some equivalent. We first used 
the cost measure several years ago during a study of message priority ina 


1, Bach message (or customer) type was considered 


communication system 
to have a separate cost rate curve, as illustrated in Figure 1. These curves 
plot rate-of-accrual of cost versus time of waiting. Curve (a) shows the 
simplest case of cost accruing at a constant rate, c, throughout the period 
that a message is waiting to be transmitted, coded, decoded, or, in general, 
served. If the message waits a time w, the total cost added to the books of 
the operation is cw. This is the case of linear costs. An order to replace 


an aircraft engine for a grounded plane might have such a curve. 


Curve (b) depicts a quadratic cost case. The longer a message waits, 
the greater its marginal cost. An important command from higher head- 
quarters might have either this characteristic or the exponential variations 
of it portrayed by curves (c) and (d). 


Curve (e) illustrates a message whose quick transmission is impor- 
tant, but whose timeliness decays exponentially. Examples might include 
the warning of a missile attack, or, on a different scale, a weather forecast. 


Curve (f) is an artist's rendition of what a general cost curve looks 
like. Note that in each of these examples the integral of the curve over the 
duration of wait equals the total cost attributable to the message. 

Assignment of cost rate curves to customers makes the conflicting 
objectives commensurable by reducing them to a common measure of sys- 
tem performance. The importance of a customer is reflected by the height 
of his curve. His aversion or intolerance to delay is represented by the 
shape of his curve. Minimizing the total cost accrued by all customers be- 
comes the single system objective. 


If all customers have identical cost rate curves, and if this curve is 
constant as in the first case of Figure 1, then minimizing cost accrual is 
equivalent to minimizing average wait. Thus, in this very simplest case, 
introduction of the cost measure does not complicate the traditional analysis. 
We shall see, however, that the complexity of the analysis rises sharply 
with even small complications to the cost measure. 


THE c/t RULE 


The next simplest case is that of constant cost-rate curves, where the 
constants are different for different customers. In a classical single-server 
situation, this leads to what has been called the cu or c/t rule. That is, if Cj 
is the cost rate of the ith unit waiting for service, and t; is its expected service 
time (u; = 1/t; is its service rate), then the unit with highest cj/t; should be 
served next. When all c; are equal, servicing in ascending order of tj is 
optimal and the c/t rule reduces to the rule of shortest-service-time-next. 


References on the c/t rule are available in many places !+4.12_ 


Note that the dimensions of c/t are cost/(time2). Thus, the c/t of a 
unit is an acceleration that indicates how quickly the cost accrual velocity 
could be reduced by serving that unit next. Past history is not relevant to 
the priority decision in the linear case; nor is there interaction among units. 
Unfortunately, these simplifications do not apply to the general nonlinear 


case, 


To examine the nonlinear case in a simple situation, suppose that at 
time tg a choice must be made between two units waiting for service. The 
first unit has a service requirement of t, and the second unit of tp. Assume 
that during the interval (to, tot, +t.) both of these units are served and no 


other units arrive. 


The cost rate curves of the two units are given in Figure 2. If the 


first unit receives priority, the cost accumulated during the interval is 


ttt igctatts 
c,(t)dt + Cy(t)dt (1) 
t t 


0 0 
whereas, if the second unit receives priority, it is 


t,+t,+t t tty 


0'"1 ‘2 0 
j c (t)dt + f Co(t)dt (2) 
to to 


The first unit should be given priority if and only if expression (2) is greater 
than or equal to expression (1). That is, 


tott tt, tott tt, 
Cy (t)dt < c {tat (3) 
t 


2 ovty 
If we let €, be the average value of c,(t) during a period of duration tg begin- 
ning at tott, and T, be the average value of cy(t) during a period of duration 
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Figure 2. Nonlinear Costs 


JOB SHOPS AND DEADLINES 


The priority problem is found in too many places for us to make an 
exhaustive list. All of us face priority problems every day of our lives, no 
matter what our occupation. We shall focus on two areas where the priority 


problem assumes central importance: job shops and computers. 


Work on job-shop priorities has been going on for a number of years 
and several good surveys exists. Carroll, for example, gives a comprehen- 
sive overview of the field and presents some interesting new results that 


illustrate the value of costing’. 


In a job shop, jobs are typically promised by a certain date called the 
due date or deadline. Each job consists of a number of tasks to be per- 
formed on different machines. Call this number N. Since a job must be 
served N separate times, it is subject to N successive priority decisions: 
when it has N tasks left to accomplish, when it has N-1 tasks left, and so on, 


until it has only one task left. 


Suppose that a pair of tasks from two different jobs are competing for 
service at a machine. The first task requires less machine time than the 
second, but the second job is in greater danger of missing its deadline. 


Which job should be given priority? 


Giving priority to the first job contributes to the objectives of high 
throughput and low average wait, but may cause a default on the second job's 
deadline. Giving priority to the second job has the opposite effect. This is 
the classical trade-off. Carroll's results indicate that cost curves can help 
in striking a balance. 


Carroll postulates a cost rate equal to the probability of a job's being 
late. In his model, this is the rate of change of the job's expected amount 
of lateness. The rate applies to the task of the job that is currently awaiting 
service. It increases monotonically toward a value of one, as the deadline of 
the job is approached, then remains at one from the deadline until the last 
task of the job has been completed, 


The conceptual model is straightforward, but the probability of being 
late is difficult to estimate accurately. Even if precise machine times for 
all tasks are known with certainty, waits between task initiations can only be 
approximated. The approximations should be based on current loads and re- 
vised as loads shift. Moreover, estimated probability of being late for a job 
should depend on the number of its remaining tasks, as well as on the 


approximated distribution of wait for each of these tasks. 
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Carroll sets aside such refinements for later study, and chooses a 
simple expedient. He estimates the sum of expected waits over the remain- 
ing tasks, and uses this estimate to postulate a threshold time beyond which 
he assumes the probability of being late rises linearly from zero to one at 
the deadline. With this probability as c, he invokes the c/t rule at each de- 
cision point. Theoretically, c should not vary, but in Carroll's model it is 
changing all the time. The end result of this compounding of assumption upon 
simplification upon approximation is a rule that produces consistently fewer 


late jobs in simulations than any previously simulated rule. 


The implications of this work extend beyond the scheduling of job 
shops. As one example, there is a decided trend today toward greater com- 
plexity in the organization of computer systems. It is a safe guess that 
future computer systems will consist of many processors, many separate 
memory modules, many input-output terminals and data coordinators, with 
flexible interconnections, simultaneous users, and concurrent operation. 
Lessons learned from the study of job shops will carry over to questions of 
scheduling space, time, and program access in these computer systems of 


the future. 


THE COMPUTATION CENTER 


The computer field is beginning to show greater interest in the priority 
problem as machine structures grow more complex, but some concern with 
the problem has existed in the field from the very start. Historically, as 
soon as a computation center had two users, it also had priority decisions to 
make. Most open-shop centers provided an informal solution: sign-up 
sheets. First on the sheet was first on the machine, or at least had first 
choice. 


Other centers used more elaborate procedures. The IBM 701 Scien- 
tific Computing Service in New York City, circa 1954, had a dispatcher who 
sat on a glass-enclosed balcony overlooking the 701 computer. One or more 
eager customers sat alongside her, ready to pounce whenever their prede- 
cessor on the machine signalled completion (or frustration) and punched out 
on the IBM time clock at the console. The dispatcher made certain that the 
queue on the balcony was never empty by phoning users in their offices and 
alerting them well ahead of time. A carefully-designed priority formula 
allowed certain customers, like Los Alamos, to gain access to the computer 
on very short notice. 


Even this did not prevent momentous inefficiency and idle time. As 
computers became more advanced and more expensive, inefficiency had to 
be wrung out of the operation. The larger computation centers became 
closed-shop, and jobs were batched serially on tape before run time to pro- 
vide fast transition between successive executions. 


At heavily-loaded centers, such as the M.I.T. Computation Center in 
1958, turnaround times soared to several days, and sometimes to a week or 
more, despite batching and the use of professional operators: Priority rules 
gave some relief to special users, Urgent needs, such as the Sputnik orbital 
calculations (which provided settings for camera and telescope stations 
around the world) were given preemptive rights. Other business ceased 
_ when a satellite was launched. In normal periods, short jobs were awarded 
express service at prescribed times of day, and very long jobs were defer- 
red to night-time shifts. 
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TIME SHARING 


Expediting service for the short user just whetted the appetite for 
more frequent access. Freeing a modest-sized program of errors by means 
of the customary iteration of running, modifying, running, and modifying, 
could require weeks or months in a batch-processed operation, as compared 
to an afternoon or evening spent at a computer (such as M,I.T.'s TX-0 or 
TX-2) which the programmer had to himself for a while. A private com- 
puter not only saved the user time, it also allowed him to interact with his 
program, view preliminary results, and alter experimental strategy on- 


line 10 i 


The recent development of time-sharing gives the impression and 
advantages of a private computer to simultaneous users at remote consoles 
of a large computer?+9+8_ This development is a concession to the user and 
recognizes his point of view. Overall system objectives suffer initially, but 
equipment being built with new system concepts will restore and ultimately 
improve past levels of operating performance. 


If we view the purpose of time sharing as the creation of a privileged 
class of user for whom the computer is continually accessible and immedi- 
ately responsive, then, in the spirit of traditional express service, we are 
led naturally to establish the short request as the privileged user. It is im- 
possible, by definition, to serve a long request instantly. Moreover, to the 
extent that the computer does give precedence to long requests, its respon- 
siveness to short requests is degraded and the purpose of time sharing is 
undermined. 


Thus, we tend to favor the class of requests for whom we can offer 
fast service. This produces an unusual situation. User and system objec- 
tives are of a single mind in that both direct us to favor the short request, 
assuming that we can identify it ahead of time. Unfortunately, we generally 
cannot. It is only after execution that we really know which was the short 
request and which was the long request. 


The typical time sharer sits at his console for an hour or more issu- 
ing requests, being served, making inquiries, and receiving replies. We do 
not want to ask him before each interaction how much time he expects to 
take for two reasons. First, we could not place much faith in his response, 
because the mind consciously or subconsciously tends to underestimate re- 
quirements, Second, time sharing is at its best when the computer and its 
characteristics are inconspicuous. The user should not have to be aware of 
his consumption of computer time for each step he takes. 
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Can the computer anticipate time requirements without being told? 


Yes, to some extent; especially when a request is a standard command or 
program which the computer knows by name. A scheduling based on such 
information would be context-sensitive, to borrow aterm. M.I.T.'s time- 
sharing supervisor infers the size of a program from its name, but does not 
make inferences about its expected time of execution!3. For the following 
mathematical discussion, we shall adopt this conservative position of assum- 


ing no prior knowledge of request time. 
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ROUND-ROBIN SCHEDULING 


Discriminating against long requests, when we are able to identify 
which ones are long, prevents them from delaying short requests. When we 
cannot identify them, we hedge. We serve a request for some fixed amount 
of time, called a quantum. If that is not sufficient, we suspend its service, 
place the balance of the request at the end of the queue, and go on to the next 
request. If the quantum is sufficient, we give the unit only enough time to 
complete its service. This types of a priority rule has been called round- 


robin scheduling!0, 


Under round-robin scheduling, the longer request is split or partitioned 
during execution, and its segments served at separate times. In general, 
there are two instances when it may pay to partition jobs in a service opera- 
tion: when there is more than one server, and when there is uncertainty in 
service times!?, Only the second case applies here, since we shall be assum- 
ing a single processor, but both cases are relevant to future time-sharing 


systems with multiple processors. 


To illustrate how partitioning can be helpful under conditions of un- 
certainty, suppose that half of all requests are for 1 second of computer 
time, and the other half for 10 seconds. The computer does not know which 
is the 10-second request before execution, but by partitioning with a quantum 
of 1 second it acquires this information at a charge of 1 second, plus an 
additional overhead charge of V occasioned by suspension of service on the 
incomplete request. Thus a 1-second request arriving for service immedi- 
ately after a 10-second request is detained by its predecessor 1+V seconds 
rather than 10 seconds. The 1+V charge is like a cost of information, al- 
though its 1-second component does substract from the 10-second request. 
Partitioning contributes to the system objective of lower average wait, as 
well as to the user objective of better response time for the short request, 


giving a double advantage in this example. 


Now we consider an example where partitioning does not even offer a 
single advantage. Suppose that all requests are known in advance to be for 
10 seconds of computing time. Round-robin scheduling with a quantum of 1 
second is then absurd. It introduces unnecessary delays without providing 
any new information. Round-robin scheduling with a quantum of 10 seconds 
(or greater) makes better sense, but is nothing more than a first-come- 


first-served rule. 


These observations lead us to the first of a few informal points that 


will be made without proof or detailed discussion. 
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Point 1 The benefits of round-robin scheduling (RRS) relative 
to first-come-first-served (FCFS) increase with the 
uncertainty of request sizes. The variance of the dis- 
tribution of request sizes may be taken as a measure 
of this uncertainty. 

In general, we consider both request size and the user's think time at 
his remote console as random variables. The think time is defined to be the 
interval between completion of a user's request and initiation or arrival at 
the processor of his next request. The time-shared operation is character- 
ized by periods during which one request is in service and others may be in 
queue, called busy periods, and periods during which no request is either in 
service or in queue, called idle periods. Idle periods may be utilized by the 


processor to nibble at deferred work stored in reserve. 


We now consider another situation. Suppose that an idle period has 
just been terminated by the arrival, in quick succession, of four requests 
for service. The first request is for four quanta of time, the second for two 
quanta, the third for three quanta, and the fourth for one quantum. If the re- 
quests were served in their entirety in strict first-come-first-served order, 
the service intervals and service completions would be as in Figure 3a. The 
scales shown are in quantum units, and overhead is assumed to be negligible. 
If it were possible to serve the shortest job next, the service completions 
would be, instead, as in Figure 3b, whereas under round-robin scheduling, 
they would be as in Figure 3c. 


unit served 


3 


. Round-Robin-Scheduling 


Figure 3. Service Completions Under Three Priority Rules 
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Notice the differences. The total time spent in waiting and service under 


the first-comc-first-served rule is 


Wroprs * 4+6+9+ 10 = 29, 


and the shortest request is completed at the end of quantum 10. Under the 
shortest-job-next rule 


Wgsn = 1+3+6+ 10 = 20, 


and the shortest request is completed at the end of quantum 1. Under round- 
robin scheduling 


Wrr * Wecrs = 29, 


and the shortest request is completed at the end of quantum 4, Observe that 
the RR and FCFS rules have identical schedules of service completions. The 
identity is coincidental, although it suggests two general points, namely: 


Point 2 When the sizes of service requests are exponentially 
distributed, spacings between service completions in 
the busy period under round-robin scheduling are also 
exponentially distributed with the same mean. The 
service completions under RRS may be thought of as 
reordered FCFS service completions, with completions 
of the shorter requests moving forward and those of 
the longer requests moving backward. Average through- 
put is unchanged if overhead is ignored. 


Point 3 When the sizes of service requests are exponentially 
distributed, the balances of these requests in excess 
of the quantum size are also exponentially distributed 
with the same mean. The same is true of the balances, 
and so on. The segments of partitioned requests from 
an exponential distribution all share a common trun- 
cated exponential distribution, regardless of their posi- 
tions in the partitioning. 
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AN ANALYTICAL MODEL 


Points 2 and 3 provide the theoretical groundwork for an analytical 
model of round-robin scheduling. Let there be N consoles connected to a 
single time shared processor. Requests for service are assumed indistin- 
guishable between consoles and exponentially distributed with mean 1/o 


and density function 
f(R) = ceOR R20 (9) 


The scheduler employs a quantum of size Q which, by Point 2, implies a 


cumulative distribution of segment sizes equal to 
F(S) = 1-e79S Q>sFo0 (10) 
=1 Ss>Q 
Requests for service are assumed to arrive at the processor from thinking 
consoles randomly, at a rate a per console. Thus, if J is the number of con- 
soles waiting for service at some arbitrary time t, (N - J)ais the arrival 


rate of requests at t. Thinking times are assumed indistinguishable between 
consoles and exponentially distributed with mean 1/a and density function 


{(T) = ae7oF T>0 (11) 


The assumption of exponentially-distributed think and request times holds up 
well in a time-sharing operation where there is wide diversity of users, as 
there is at M.LT. 


The formulation permits us to employ the queuing model for machine 
servicing (or interference) which was developed in 1933 and has had consid- 
erable practical application since then4.6 | We draw an analogy between 
time-sharing and machine servicing by viewing consoles in the role of 
machines, and the computer processor in the role of a repairman who serv- 
ices the machines. A console thinking is like a machine working, and a con- 
sole waiting is like an idle machine in need of repairs. The machine-servic- 
ing model extends easily to the case of several repairmen, and is therefore 
applicable to a time-sharing system of multiple processors. This extension 
would be useful for further development of the present work, and is employed 


in the doctoral dissertation by Scherr!4, 


An equivalent, but more subtle analogy exists between time-sharing 
and calls coming into a telephone exchange containing N trunk lines and no 
facility for holding. The arrival and server processes are interchanged in 
this analogy. Waiting consoles become open trunk lines; thinking consoles 


become busy trunk lines; the arrival of a request for computer time becomes 
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termination of a telephone call; and completion of a computer request be- 
comes arrival of a telephone call when one or more lines are open. Calls 
that arrive when all lines are busy do not enter the exchange, and are lost. 
This model has been widely applied in the communications field and produces 
the well-known loss formula attributed to Erlang. The finiteness of trunk 
lines, like the finiteness of machines in machine servicing and the finiteness 


of consoles in time-sharing, is a distinguishing characteristic of the model. 


To apply the machine-servicing model to round-robin scheduling of a 
time-shared computer, we must incorporate partitioning of service requests 
and introduce overhead. Partitioning is represented by f(S), the density 


function of segment sizes, whose first two moments are 


S,= (1 - @ FQ) (12) 


- 2 ~ Ce 72 
SS) = 5h) Qe ) (13) 


2 
Overhead is accounted for by adding the constant V to Sy, and (25,V + v?) to 
So. We may think of V as an average time required to bring a program into 
and out of primary storage. The effect of V is to lengthen request sizes and 
degrade the rate of processing. The smaller that quantum size Qis fora 
given R, or the larger that request size Ris for a given Q, the greater num- 
ber of segments into which R is partitioned and the more R is prolonged by 
overhead. The mean of f(R'), the distribution of prolonged request sizes, is 


given by 
Io" = 1/o+ V/(1-e7°%) (14) 


In practice, f(R) is only approximately exponential, and extending the 
exponential assumption to include f(R') may not weaken the approximation 
Significantly. We take this liberty, even though there do exist non-exponen- 
tial formulations for the machine-servicing problem }5, Using o' in the 
simple exponential formulation, and letting Py be the steady-state proba- 


bility that J of the N consoles are waiting, we get 


ceed N 74); \N-I 
a cae ec ale eaten met 


Py is a truncated Poisson probability. J=0 produces Erlang's loss formula. 


When a request arrives at the processor from a console that has just 
passed from thinking to waiting, an interval may elapse before its first seg- 
ment begins receiving service. Call the expected value of this interval, Yj, 
its first cycle time. In the same manner, let Y; be the expected value of the 


interval from the time its (i-1) segment is completed until the time itsi 
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segment is initiated, for iz 2. Notice that the same Y,; is common to all re- 


quests containing i or more segments. 


To obtain an expression for Y;, we modify the reasoning of Cobham? 


to take account of partitioning and the state-dependent arrival rate. The 
probability that there are J consoles waiting (and in service) immediately 
prior to an arrival is 


P(J|arrival) = P;,,/(1-Py) —_ J=0,1,...,N-1 (16) 


This is also the probability that there are J+1 consoles waiting, given that 
the system is busy, which implies that the expected number of consoles wait- 
ing, immediately prior to an arrival, and the expected number of consoles 

in line (exclusive of the possible one in service) when the system is busy, 
are equal to each other and to 


Lp * N/(1-P5) - o'/a - 1 (17) 
The expected number of consoles in line (exclusive of the possible one in 
service) immediately prior to an arrival is 
1-P,-P, 
The expected time to completion of the possible segment in service, given an 


arrival, is ~ 


S,+28,V+v2 1-P.-P 
Yo =| eee | «| 4S (19) 
1 “*0 
An expression for Yy is 
Y¥, = Yo t+ L, (8,+V) (20) 


which is the expected time to finish the segment in service, plus the expected 
time to serve the segments in line ahead of the new arrival. 


Let us tag the new arrival and follow its progress through the system. 
After its first segment has completed service, assuming there is a positive 
balance remaining, the balance returns to the end of the line. Call the ex- 
pected number of segments ahead of the tagged balance, including the one 
that is entering service, Lo. Here, Lo is the sum of the expected number of 
the L, requests which were not completed during the first cycle, plus the 
expected number of new arrivals during the first cycle while the tagged unit 
was in service. 


Thus, 


Ly = Lye 7? +0, [L,(S,+V) + (Q+V)] (21) 
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where Oy is the average arrival rate from the beginning of the first cycle 
through the beginning of the second cycle. The actual arrival rate depends on 
the number of consoles thinking, and hence changes with each new arrival and 
each service completion. We estimate its mean by averaging its value at the 
beginning of cycle 1 with its value at the beginning of cycle 2. For simplicity, 
we ignore the possible unit initially in service. 


ie 
Solving equations (21) and (22) for Lo(and Yo), and replacing the subscript 2 
by i, gives 
L,_,e° 0% + a(N-1-L,_,/2) (Y,_ ,+Q+V) 
ae 1 i-1 i-1 (23) 
iT Fay, FQHVIT2 
Y, = LAS)+V) 1 = 253,235 (24) 


For a still simpler approximation to Y;, we can assume that all cycles 
after the first have the same average length, and set this length equal to 


Yp = L,(S,+V) (25) 


The approximations (24) and (25) for Y; were both used to cost round-robin 
strategies with parameter values from the M.I.T. operation, and yielded costs 
within zero to five percent of each other. The approximations compare well 


with results from simulations and statistics from actual running experience. 
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COSTING THE MODEL 


Analytical studies of time sharing have tended to conclude with de- 
rivations of operating efficiency, average number waiting, and average 
delay? 11,14. This leaves the matter of optimum quantum size still ambigu- 
ous. We shall approach the problem directly by applying the earlier discus- 
sion of cost curves to the model just proposed. 


We postulate that a request of size R waiting a time W adds CpW to 
the cost accumulated by the operation. The cost rate Cp is chosen to be 
constant with respect to W, corresponding to the linear case depicted by 
Figure la. This choice permits us to use the preceding expected value 


arguments. 


To express preference for short requests, we let Cp be a monotonically 
non-increasing function of R. If a request of size R contains k segments for 
a given Q, then its expected wait, exclusive of its own service, but including 


its overhead, is 


k 
Ww, = = Y¥, + kv (26) 
i=l 
A measure of total cost is therefore 
fk kQ 
C= £ x ¥; + xv] [ CR f(R)dR (27) 
k=1Li=1 (k-1)Q 
In particular, if 
= (oY) Qn YR 
Cp = FS“)e (28) 
and we use approximation (25) for Y; (i2 2), then 
L.(S,+V) 
c-y, +(-21 + LA (29) 
1*| seAN@_, | * | TRIO 


The exponential cost function (28) has the effect of steepening the pitch of 
f(R') while maintaining its exponential form. It is as though we pretend that 
small requests are more numerous than they really are, then use average 


wait to measure the performance of the operation. 


Expression (29) has been used to compare the cost implications of 
different choices of Q. Notice that an infinite quantum, corresponding to a 
FCFS discipline, produces C=Y,+V, independent of y. In the calculation of C 
as a function of Q, values for a and o observed by Scherr!4 were combined 
with a range of values for N, V, andy. The results are shown as log-log 


plots in Figure 4. 
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2 4 | 2 4 Q 2 4 | 2 4 
seconds | 
Figure 4. Cost Performances as a Function of Quantum Size; (N and y parametric) 


The larger y, the more we are advised to favor short requests and the 


smaller is the optimal quantum. The steepness of the curves in Figure 4 to 
the left of their minima is due to V, and declines as V approaches zero. This 
is shown better by the semi-log plot in Figure 5 for y=o and N=30. The 
relative flatness of the curves to the right of their minima suggests that it is 
better to be high in the selection of Q, rather than low, except when V is 
negligible. 

As V increases, the maximum cost saving possible from partitioning 
diminishes, and the optimal quantum grows in size. This is hardly surpris- 
ing, since overhead is pure cost to the operation. In the absence of overhead 
(V=0), partitioning is able to favor the short requests without degrading 
average service. In fact, when V=0, the cost can be reduced by as much as 


1 - C(Q=0)/C(Q=#) = 1-(1+0/y)7! (30) 
a cost saving of 50 percent for y=0, up to a theoretical limit of 100 percent 
as Y goes to infinity. When V > 0, however, even though partitioning still 


benefits some requests, average service must suffer; the smaller Q, the more 


it suffers. 


The requests that benefit are those whose expected wait is smaller 
than the average FCFS system wait. That is, those requests for which W, 
from equation (26) is less than 


Wecrs = Lp/o' (31) 


where Lp is given by equation (17), with Q=@ and o'=(1/0+V)"!. Equation (31), 
with o' calculated from equation (14), can be used to infer the average RRS 
system wait for finite Q. This wait increases monotonically with decreasing 
Q when V> 0. 


22 


2 4 6 | 2 4 6 Q 
seconds 


Figure 5. Cost Performances as a Function 
of Quantum Size; (N and y fixed) 
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MULTIPLE LEVELS AND VARIABLE QUANTA 


Round-robin scheduling, as we have been discussing it, is a little like 
traffic flow on a single-lane highway. A faster car must wait until it is safe 
to pull out and pass a slower car. Throughout the duration of its travel, a 


slower car delays all of the faster cars that come upon it. 


This is in contrast to a multiple-lane highway where the slower cars 
can stay to the right, out of the way of the speedier ones. The multiple-lane 
highway has a counterpart in round-robin scheduling. It is a scheduler 
with more than one level of priority. The multi-level type of scheduler was 
<3 
6 


proposed by Corbat6” in one of the first papers on time-sharing. 


With a multi-level scheduler, if a request is not completed within its 
quantum, its balance falls to the next priority level and is not served until 
everything ahead of it has been served, including new arrivals at the higher 
levels. A request arriving at a higher level (than the one occupied by the 
request in service) may preempt this request immediately, or at the end of 
some prescribed time, such as the quantum of the higher level. Each level 
may have a different quantum associated with it. Tio reduce overhead, 
Corbaté suggested quantum sizes increasing exponentially for each lower 
level of priority. He also recommended discriminating against a new 
arrival whose program size was large, since larger programs have greater 
overhead requirements than smaller programs. Small programs may be 
entered at the highest priority level, medium-sized programs at the next 


higher level, and so on. 


Going to multiple levels of priority, and then to different quanta for 
each level, presents two additional degrees of freedom to the scheduling 
problem. A third element of flexibility is obtained by allowing the size of 
quanta to vary with the state of the system. Thus, when there are relatively 
few consoles waiting, we might lengthen the quantum in order to reduce 
overhead and response time for the request in service. The request at the 
end of the line is still served without undue wait. 


Conversely, if the system is heavily congested, we might wish to 
shorten the quantum to allow for the possibility that some of the later re- 
quests are small ones. This maneuver does preserve reasonable response 
times for a priveleged few, but has the unfortunate effect of further degrad- 


ing the processor just when its speed is needed most. 


A quantum that is state-dependent is familiar to those of us who think 
of our work schedules as round-robin in spirit. We spend as much time as 


we can on the task of greatest immediate importance, finishing it if possible 
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without getting too far behind in other responsibilities that accumulate. We 
make our priority decisions heuristically, and adapt to changing work loads. 
This flexibility can be imitated by a computer. The decision procedure of 
the computer need not be rigid, although rigid procedures do have certain 
advantages. They tend to make analysis simpler, and are generally more 
economical in computational requirements. Excessive flexibility can cost 
more than it saves. 
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CONCLUDING REMARKS 


It is time to look back over the road we traveled and raise certain 
questions that have been put aside. We began by introducing a general method 
of costing a service operation, and sketched its application to the priority 
scheduling of a job shop. Then we discussed the simple round-robin schedul- 
ing of a time-shared computer system, and used expected-value arguments 
and linear costing to measure its performance as a function of quantum size. 
Now we have just mentioned possible extensions to the simple round-robin 
procedure. Our main energy has been devoted to finding better ways of 
favoring the short request. 


As the lot of the short request improves, that of the longer request 
must worsen, assuming that average performance gets no better. Does 
linear costing give adequate attention to the growing wait suffered by the 
longer request? No, if we believe that the second minute of waiting is worse 
than the first. To take account of such nonlinearities, we must do nonlinear 


costing, using curves like (f) of Figure 1. 


Nonlinear costing cannot be accomplished analytically with expected- 
value arguments, but it is easily applied to the results of a simulation or to 
actual operating statistics. Each wait is costed as it is recorded, by means 
of an appropriate cost curve. Choice of a cost curve may have to be some- 
what arbitrary, but it can also be reasonable. This is illustrated by our 
selection in the expected-value analysis of a constant cost rate whose loga- 
rithm was negatively proportional to size of request. 


One approach to nonlinear costing is to postulate for each request a 
desirable response time or deadline as in the job shop. This leads to cost 
curves that are step functions. Deadlines are assigned on the basis of 
name of command, importance of problem, nature of use, or some combina- 
tion of such factors. Priorities may then be awarded by a c/t rule, where c 
reflects the probability that a request will miss its deadline, and t is an 
estimate of the expected time to complete the request. Round-robin schedul- 
ing provides a hedge on this estimate. Quantum sizes can vary with the 


number of users waiting and the imminence of a deadline. 


Is shortness of request really a valid criterion for awarding priority? 
The answer is yes, if we believe that the main purpose of time sharing is to 
create quick access and brief response times for small users; no, if we pre- 
fer to believe that responsiveness should be tailored to individual need. Who 
should get better service: the highly interactive researcher who is amplify- 
ing his creative powers by requesting lengthy statistical regressions and 
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complex data transformations in rapid succession as though they were simple 
additions; or the casual user who requests minor editoral changes in his pro- 
gram once every fifteen minutes or so? And what of the user who is doing a 
little of both? 


These are difficult questions. There are at least three ways to answer: 


1. We can shrug our shoulders at the multiplicity of different possi- 
bilities, and continue to operate in the manner assumed by the 
previous analysis, hoping that the strategy of favoring the short re- 
quest is best on the average. 


2. We can try to discriminate between more and less interactive users 
by shifting our attention from request sizes to think times, conjec- 
turing that the length of time a user pauses (or works) between 
successive requests indicates the quality of service he requires to 
keep him creative. 


3. We can accede to the special needs of highly interactive users with 
long requests, but insist that they identify themselves by making 
known their willingness to pay a premium price per unit of compu- 
tation. 

A pricing system can be based either on real money or budgeted com- 
puter allotments in dollar units. If the user is able to change his bid on-line, 
and if the computer favors the highest bidder, then time-sharing assumes the 
appearance of a one-sided auction market. As in the two-sided stock ex- 
change, the buyer submits either a limit bid for service at a particular price, 
or a market bid for service at the current price. Like a specialist on the 
floor of the exchange, the computer can keep the price stable by taking a 
position; namely, working on its reserve of deferred jobs whenever the price 
threatens to fall too low or move too abruptly. 


If requests are still partitioned under a pricing system, the price paid 
can influence both a user's quantum and his priority level. Priorities and 
prices are related concepts. They each serve to allocate limited resources, 
A pricing-priority system makes the relationship explicit. In the process, it 


permits the user to be party to the priority decision. 
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